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Abstract 

In this paper, we will formalize the method of dual fitting and the idea of factor-revealing LP. 
This combination is used to design and analyze two greedy algorithms for the metric uncapac- 
itated facility location problem. Their approximation factors are 1.861 and 1.61, with running 
times of 0(m log m) and 0{n'^), respectively, where n is the total number of vertices and m is 
the number of edges in the underlying complete bipartite graph between cities and facilities. The 
algorithms are used to improve recent results for several variants of the problem. 



1 Introduction 

A large fraction of the theory of approximation algorithms, as we know it today, is buih around 
the theory of hnear programming, which offers the two fundamental algorithm design techniques 
of rounding and the primal-dual schema (see [44|). Interestingly enough, the LP-duality based 



analysis [30, |l^ for perhaps the most central problem of this theory, the set cover problem, did not 
use either of these techniques. Moreover, the analysis used for set cover does not seem to have found 
use outside of this problem and its generalizations [^] , leading to a somewhat unsatisfactory state 
of affairs. 

In this paperQ, we formalize the technique used for analyzing set cover as the method of dual fitting, 
and we also introduce the idea of using a factor-revealing LP. Using this combination we analyze 
two greedy algorithms for the metric uncapacitated facility location problem. Their approximation 
factors are 1.861 and 1.61, with running times of 0(m log m) and O(n^) respectively, where m and 
n denote the total number of edges and vertices in the underlying complete bipartite graph between 
cities and facilities. In other words, m = Uc x nf and n = Uc + nj, where Uc is the number of cities 
and Hf is the number of facilities. 

1.1 Dual fitting with factor-revealing LP 

The set cover problem offers a particularly simple setting for illustrating most of the dominant ideas 



in approximation algorithms (see |44|). Perhaps the reason that the method of dual fitting was not 
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clear so far was that the set cover problem did not require its full power. However, in retrospect, 
its salient features are best illustrated again in the simple setting of the set cover problem - we do 
this in Section ^ 

The method of dual fitting can be described as follows, assuming a minimization problem: The basic 
algorithm is combinatorial - in the case of set cover it is in fact a simple greedy algorithm. Using 
the linear programming relaxation of the problem and its dual, one first interprets the combinatorial 
algorithm as a primal-dual-type algorithm - an algorithm that is iteratively making primal and dual 
updates. Strictly speaking, this is not a primal-dual algorithm, since the dual solution computed 
is, in general, infeasible (see Section ^ for a discussion on this issue). However, one shows that the 
primal integral solution found by the algorithm is fully paid for by the dual computed. By fully paid 
for we mean that the objective function value of the primal solution is bounded by that of the dual. 
The main step in the analysis consists of dividing the dual by a suitable factor, say 7, and showing 
that the shrunk dual is feasible, i.e., it fits into the given instance. The shrunk dual is then a lower 
bound on OPT, and 7 is the approximation guarantee of the algorithm. 

Clearly, we need to find the minimum 7 that suffices. Equivalently, this amounts to finding the 
worst possible instance - one in which the dual solution needs to be shrunk the most in order to be 
rendered feasible. For each value of tt-c, the number of cities, we define a factor-revealing LP that 
encodes the problem of finding the worst possible instance with ric cities as a linear program. This 
gives a family of LP's, one for each value of ric- The supremum of the optimal solutions to these 
LP's is then the best value for 7. In our case, we do not know how to compute this supremum 
directly. Instead, we obtain a feasible solution to the dual of each of these LP's. An upper bound on 
the objective function values of these duals can be computed, and is an upper bound on the optimal 
7. In our case, this upper bound is 1.861 for the first algorithm and 1.61 for the second one. In 
order to get a closely matching tight example, we numerically solve the factor-revealing LP for a 
large value of ric- 

The technique of factor-revealing LPs is similar to the idea of LP bounds in coding theory. LP bounds 
give the best known bounds on the minimum distance of a code with a given rate by bounding the 
solution of a linear program, (cf. McEliece et al. fS^). In the context of approximation algorithms, 
Goemans and Kleinberg [^] use a similar method in the analysis of their algorithm for the minimum 
latency problem. 



1.2 The facility location problem 

In the (uncapacitated) facility location problem, we have a set J- oi rif facilities and a set C of ric 
cities. For every facility i & J-, a, nonnegative number /j is given as the opening cost of facility i. 
Furthermore, for every facility i ^ T and city j E C, we have a connection cost (a.k.a. service cost) 
Cij between facility i and city j. The objective is to open a subset of the facilities in !F, and connect 
each city to an open facility so that the total cost is minimized. We will consider the metric version 
of this problem, i.e., the connection costs satisfy the triangle inequality. 

This problem has occupied a central place in operations research since the early 60's P, 26, 2^, ^ 41], 



and has been studied from the perspectives of worst case analysis, probabilistic analysis, polyhedral 



combinatorics and empirical heuristics (see [11, ^). Although the first approximation algorithm for 



this problem, a greedy algorithm achieving a guarantee of O(logn) in the general (non- metric) case 



due to Hochbaum |Q, dates back to almost 20 years ago, renewed interest in recent years has resulted 
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in much progress. Recently, the problem has found several new applications in network design 
problems such as placement of routers and caches [ 16 , 29 ] , agglomeration of traffic or data 
and web server replications in a content distribution network (CDN) pF 



mm- 

The first constant factor approximation algorithm for this problem was given by Shmoys, Tardos, 
and Aardal |3^. Later, the factor was improved by Chudak and Shmoys |^ to 1 + 2/e. Both these 
algorithms were based on LP-rounding, and therefore had high running times. 

Jain and Vazirani |^2| gave a primal-dual algorithm, achieving a factor of 3, and having the same 
running time as ours (we will refer to this as the JV algorithm). Their algorithm was adapted for 
solving several related problems such as the fault-tolerant and outlier versions, and the A:-median 
23, ^. Mettu and Plaxton [34] used a restatement of the JV algorithm for the on-line 



problem 
median problem. 

Strategies based on local search and greedy improvement for facility location problem have also 
been studied. The work of Korupolu et al. shows that a simple local search heuristic pro- 
posed by Kuehn and Hamburger yields a (5 -|- e)-approximation algorithm with a running time 
of 0(n^ log n/e), for any e > 0. Charikar and Guha |^] improved the factor slightly to 1.728 by 
combining the JV algorithm, greedy augmentation, and the LP-based algorithm [^]. They also com- 
bined greedy improvement and cost scaling to improve the factor of the JV algorithm to 1.853. For 
a metric defined by a sparse graph, Thorup |43] has obtained a (3 -|- o(l))-approximation algorithm 
with running time 0(|-E|). Regarding hardness results, Guha and Khuller [|l5| showed that the best 
approximation factor possible for this problem is 1.463, assuming 

Since the publication of the first draft of the present paper, two new algorithms have been proposed 
for the facility location problem. The first algorithm, due to Sviridenko |42|, uses the LP-rounding 
method to achieve an approximation factor of 1.58. The second algorithm, due to Mahdian, Ye, and 
Zhang [ p2| , combines our second algorithm with the idea of cost scaling to achieve an approximation 
factor of 1.52, which is currently the best known factor for this problem. 



1.3 Our results 



Our first algorithm is quite similar to the greedy set cover algorithm: iteratively pick the most cost- 
effective choice at each step, where cost-effectiveness is measured as the ratio of the cost incurred 
to the number of new cities served. In order to use LP-duality to analyze this algorithm, we give an 
alternative description which can be seen as a modification of the JV algorithm - when a city gets 
connected to an open facility, it withdraws whatever it has contributed towards the opening cost of 
other facilities. This step of withdrawing contribution is important, since it ensures that the primal 
solution is fully paid for by the dual. 

The second algorithm has a minor difference with the first one: A city might change the facility to 
which it is connected and connect to a closer facility. If so, it offers this difference toward opening 
the latter facility. 

The approximation factor of the algorithms are 1.861 and 1.61, with running times of 0(m log m) 
and O(n^) respectively where n is the total number of vertices and m is the number of edges in the 
underlying complete bipartite graph between cities and facilities. 

We have experimented our algorithms on randomly generated instances as well as instances obtained 
from the Operations Research library and GT-ITM Internet topology generator ||4^. The cost 
of the integral solution found is compared against the solution of the LP-relaxation of the problem, 
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rather than OPT (computing which would be prohibitively time consuming). The results are en- 
couraging: The average error of our algorithms is about 3% and 1% respectively, and is a significant 
improvement over the JV algorithm which has an error of even 100% in some cases. 

The primal-dual algorithm of Jain and Vazirani [^2| is versatile in that it can be used to obtain 
algorithms for many variants of the facility location problem, such as fc-median [^2|, a common 
generalization of fc-median and facility location ||2^, capacitated facility location with soft capaci- 



ties [22], prize collecting facility location Q, and facility location with outliers In Section ^, we 
apply our algorithms to several variants of the problem. First, we consider a common generalization 
of the facility location and /c-median problems. In this problem, which we refer to as the k-facility 
location problem, an instance of the facility location problem and an integer k are given and the 
objective is to find the cheapest solution that opens at most k facilities. The /c-median problem is 
a special case of this problem in which all opening costs are 0. The /c-median problem is studied 
extensively H, |6|, 0, ^ and the best known approximation algorithm for this problem, due to Arya et 
al. 1^], achieves a factor of 3 -|- e. The /c-facility location problem has also been studied in operations 
research |11], and the best previously known approximation factor for this problem was 6 [^ ]. 

Next, we show an application of our algorithm to the facility location game. We also use our 
algorithm to improve recent results for some other variants of the problem. In the facility location 
problem with outliers we are not required to connect all cities to open facilities. We consider two 
versions of this variant: In the robust version, we are allowed to leave I cities unconnected. In 
facility location with penalties we can either connect a city to a facility, or pay a specified penalty. 
Both versions were motivated by commercial applications, and were proposed by Charikar et al. 
In this paper we will modify our algorithm to obtain a factor 2 approximation algorithm for these 
versions, improving the best known result of factor 3 [Q. 

In the fault tolerant variant, each city has a specified number of facilities it should be connected to. 



This problem was proposed in |23| | and the best factor known is 2.47 |18]. We can achieve a factor 
of 1.61 when all cities have the same connectivity requirement. In addition, we introduce a new 
variant which can be seen as a special case of the concave cost version of this problem: the cost of 
opening a facility at a location is specified and it can serve exactly one city. In addition, a setup 
cost is charged the very first time a facility is opened at a given location. 



2 Algorithm 1 

In the following algorithm we use a notion of cost effectiveness. Let us say that a star consists of one 
facility and several cities. The cost of a star is the sum of the opening cost of the facility and the 
connection costs between the facility and all the cities in the star. More formally, the cost of the star 
(i, C"), where i is a facility and C" C C is a subset of cities, is fi + J2j£C' ^ij- The cost effectiveness 
of the star (i, C) is the ratio of the cost of the star to the size of C, i.e., {fi + J2j£C' ^ij) /\^'\ ■ 

Algorithm 1 

1. Let U be the set of unconnected cities. In the beginning, all cities are unconnected i.e. U := C 
and all facilities are unopened. 

2. While [/ / 0: 
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• Among all stars, find the most cost-effective one, (i, C"), open facility i, if it is not already 
open, and connect all cities in C to i. 



Note that a facility can be chosen again after being opened, but its opening cost is counted only 
once since we set to zero after the first time the facility is picked by the algorithm. As far as cities 
are concerned, every city j is removed from C, when connected to an open facility, and is not taken 
into consideration again. Also, notice that although the number of stars is exponentially large, in 
each iteration the most cost-effective pair can be found in polynomial time. For each facility i, we 
can sort the cities in increasing order of their connection cost to i. It can be easily seen that the 
most cost-effective star will consist of a facility and a set, containing the first k cities in this order, 
for some k. 

The idea of cost effectiveness essentially stems from a similar notion in the greedy algorithm for 
the set cover problem. In that algorithm, the cost effectiveness of a set S is defined to be the cost 
of S over the number of uncovered elements in S. In each iteration, the algorithm picks the most 
cost-effective set until all elements are covered. The most cost-effective set can be found either by 
using direct computation, or by using the dual program of the linear programming formulation for 
the problem. The dual program can also be used to prove the approximation factor of the algorithm. 
Similarly, we will use the LP-formulation of facility location to analyze our algorithm. As we will 
see, the dual formulation of the problem helps us to understand the nature of the problem and the 
greedy algorithm. 

The facility location problem can be captured by an integer program due to Balinski |^]. For the 
sake of convenience, we give another equivalent formulation for the problem. Let iS be the set of 
all stars. The facility location problem can be thought of as picking a minimum cost set of stars 
such that each city is in at least one star. This problem can be captured by the following integer 
program. In this program, xs is an indicator variable denoting whether star S is picked and cs 
denotes the cost of star S. 

minimize ^ C5X5 (1) 




• Set fi := 0, [/:=[/ \ C. 





SG5 



subject to 




s.jes 



V5 G 5 



xs G {0,1} 



The LP-relaxation 



of this program is: 



minimize 




(2) 



Se5 



subject to 




S:jes 
yS eS : xs>0 



The dual program is: 



maximize 




(3) 
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subject to V5 G S : ^ aj < cs 

jeSnC 
eC: aj > 

There is an intuitive way of interpreting the dual variables. We can think of aj as the contribution 
of city j, or its share toward the total expenses. Note that the first inequality of the dual can also 
be written as J2j£C niax(0, aj — Cij) < fi for every facility i. We can now see how the dual variables 
can help us find the most cost-effective star in each iteration of the greedy algorithm: if we start 
raising the dual variables of all unconnected cities simultaneously, the most cost-effective star will be 
the first star {i, C) for which X^jec ™ax(0, Oj — Cij) = fi. Hence we can restate Algorithm 1 based 
on the above observation. This is in complete analogy to the greedy algorithm and its restatement 
using LP-formulation for set-cover. 

Restatement of Algorithm 1 

1. We introduce a notion of time, so that each event can be associated with the time at which 
it happened. The algorithm starts at time 0. Initially, each city is defined to be unconnected 
{U := C), all facilities are unopened, and aj is set to for every j. 

2. While ?7 7^ 0, increase the time, and simultaneously, for every city j € U, increase the 
parameter aj at the same rate, until one of the following events occurs (if two events occur at 
the same time, we process them in arbitrary order). 

(a) For some unconnected city j, and some open facility i, aj = Cij. In this case, connect 
city j to facility i and remove j from U. 

(b) For some unopened facility i, we have J^j^u ^^^(^^ '^j ~ ^ij) ~ /«• This means that the 
total contribution of the cities is sufficient to open facility i. In this case, open this facility, 
and for every unconnected city j with aj > Cjj, connect j to i, and remove it from U . 

In each iteration of algorithm 1 the process of opening a facility and/or connecting some cities will 
be defined as an event. It is easy to prove the following lemma by induction. 

Lemma 1 The sequence of events executed by Algorithm 1 and its restatement are identical. 

Proof: By induction. □ 

This restatement can also be seen as a modification of JV algorithm The only difference is that 
in JV algorithm cities, when connected to an open facility, are not excluded from [/, hence they 
might contribute towards opening several facilities. Due to this fact they have a second cleanup 
phase in which some of the already open facilities will be closed down. 

Also, it is worth noting that despite the similarity between Algorithm 1 and Hochbaum's greedy 
algorithm for facility location (which is equivalent to the set cover algorithm applied on the set of 
stars), they are not equivalent. This is because we set fi to zero after picking a set containing /j. 
As the following example shows, the approximation factor of Hochbaum's algorithm is ^{\^^^) 
on instances with metric inequality: Consider k facilities with opening cost located in the same 
place Also k — \ groups of cities Si, S2, . . . ., Sk-i- The group Si consists of p'^^^+i cities with 
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distance J2j=i...iP'~^ from the facilities. Other distances are obtained from the triangle inequality. 
Hochbaum's algorithm opens all facilities and therefore its solution costs more than kp^ . The 
optimum solution is p'' + Y^i=i„,k-i J2j=i...iP^~^- show that with a careful choice of k, 

the ratio of these two expressions is ^(i^^^)- We do not know whether the approximation factor 
of Hochbaum's algorithm on metric instances is strictly less than logn or not. 

3 Analysis of Algorithm 1 

In this section we will give an LP-based analysis of the algorithm. As stated before, the contribution 
of each city goes towards opening at most one facility and connecting the city to an open facility. 
Therefore, the total cost of the solution produced by our algorithm will be equal to the sum J^j otj 
of the contributions. However, a. is not a feasible dual solution as it was in JV algorithm. The 
reason is that in every iteration of the restatement of Algorithm 1, we exclude a subset of cities and 
withdraw their contribution from all facilities. So at the end, for some facility i, max(Q!j — q^, 0) 
can be greater than /j and hence the corresponding constraints of the dual program is violated. 

However, if we find an 7 for which a/7 is feasible, ckj/t would be a lower bound to the optimum 
and therefore the approximation factor of the algorithm would be at most 7. This observation 
motivates the following definition. 

Definition Given aj = 1, . . . ,nc), a facility i is called at most 7-overtight if and only if 

^max(aj/7 - Cij,0) < /j. 

Using the above definition, it is trivial that a/7 is a feasible dual if and only if each facility is at 
most 7-overtight. Now, we want to find such an 7. Note that in the above sum we only need to 
consider the cities j for which aj > 'jCij . Let us assume without loss of generality that it is the case 
only for the first k cities. Moreover, assume without loss of generality that cxi < a2 < ■ ■ ■ < aj-- 
The next two lemmas express the constraints on a imposed by the problem or our algorithm. The 
first lemma mainly captures metric property and the second one expresses the fact that the total 
contribution offered to a facility at any time during the algorithm is no more than its cost. 

Lemma 2 For every two cities and facility i, aj < aji + Ciji + Cij. 

Proof: If aj' > aj, the inequality obviously holds. Assume aj > aj'. Let i' be the facility that 
city j' is connected to by our algorithm. Thus, facility i' is open at time aj/. The contribution aj 
cannot be greater than Ci'j because in that case city j could be connected to facility i' at some time 
t < aj. Hence aj < Ci'j. Furthermore, by triangle inequality, Cj/j < Cj/j' + Cjj' < ctj' +Ciji +Cij. 
□ 

Lemma 3 For every city j and facility i, YA=j max(Q;j — cu, 0) < /j. 

Proof: Assume, for the sake of contradiction, that for some j and some i the inequality does 
not hold, i.e., 5^^=^ max(aj — Cifc,0) > /j. By the ordering on cities, for k > j, a^ > aj. Let time 
t = aj. By the assumption, facility i is fully paid for before time t. For any city k, j < k < Uc for 



7 



which Uj — Cik > the edge {i, k) must be tight before time t. Moreover, there must be at least one 
such city. For this city, Ok < aj, since the algorithm will stop growing Uk as soon as k has a tight 
edge to a fully paid for facility. The contradiction establishes the lemma. □ 

Subject to the constraints introduced by Lemmas and 0, we want to find the minimum 7 for 



which J2j=i{'^j /l ~ (^ij) — fi- other words, we want to find the maximum of the ratio 4, 

We can define variables /, dj, and aj, corresponding to facility cost, distances, and contributions 
respectively and write the following maximization program: 



Zk = maximize 



subject to aj < aj+i \/j € {I, . . . ,k — 1} ^^-j 

aj <ai + dj + di Vj, I e {1, . . . ,k} 

Y:tj max(Q,- - di,0) <f yje{l,...,k} 

aj,djj>0 yj£{l,...,k} 

It's not difficult to prove that Zj. (the maximum value of the objective function of program ^ is 
equal to the optimal solution of the following linear program which we call the factor-revealing LP. 



Zk = maximize ^ 



subject to / + T,j=i dj < 1 

ttj < "j+i Vi G {!,..., /c- 1} 

< a; + dj + di Vj, I & {1, . . . , k} 

Xji > aj - di yjj e {l,...,k} 

j:i=jXji<f ViG {!,..., A;} 

aj,djj>0 Vi G {!,... 

Lemma 4 Let 7 = supk^i{zk} . Every facility is at most 'j-overtight 

Proof: Consider facility i. We want to show that ^jmax{aj /'^ — Cij,0) < fi. Suppose without 
loss of generality that the subset of cities j such that aj > ^Cij is {j = 1,2, . . . , k} for some k. 
Moreover ai < 02 < • • • Ofc- Let dj = Cij, j = 1, . . . , k, and f = fi- By Lemmas |2| and |^ it follows 
immediately that the constraints of program ^ are satisfied. Therefore, ai,di, f constitute a feasible 

—1 

solution 01 program |^. i^onsequemiy < -Z/c ■ ^ 

By what we said so far, we know that the approximation factor of our algorithm is at most 
supkyi{zk}. In the following theorem, we prove, by demonstrating an infinite family of instances, 
that the approximation ratio of Algorithm 1 is not better than supfc>i{zfc}. 

Theorem 5 The approximation factor of our algorithm is precisely sup,t>i{-^fc}- 
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Proof: Consider an optimum feasible solution of program ^ We construct an instance of the 
facility location problem with k cities and k + 1 facilities as follows: The cost of opening facility i is 



fi 



if 1 < i < yfc 
/ ifi = k + l 



The connection cost between a city j and a facility i is: 

aj if 1 < i = j < k 

dj if l<j <k,i = k + l 

di + dj + Oj otherwise 

It is easy to see that the connection costs satisfy the triangle inequality. On this instance, our 
algorithm connects city 1 to facility 1, then it connects city 2 to facility 2, and finally connects city 
k to facility k. (The inequality J2i=j^^i(^j ~ di,0) < / guarantees that city i can get connected 
to facility i before facility /c + 1). Therefore, the cost of the restatement of Algorithm 1 is equal to 
Sj=l Cjj + J2i=l fi = Sj=i '^j ~ ^k- 

On the other hand, the optimal solution for this instance is to connect all the cities to facility k-\-l. 
The cost of this solution is equal to J2j=i Cfc+ij' + fk+i = / + J2j=i dj < 1. 

Thus, our algorithm outputs a solution whose cost is at least times the cost of the optimal 
solution. □ 

The only thing that remains is to find an upper bound on supk>i{zk}- By solving the factor- 
revealing LP for any particular value of k, we get a lower bound on the value of 7. In order to 
prove an upper bound on 7, we need to present a general solution to the dual of the factor-revealing 
LP. Unfortunately, this is not an easy task in general. (For example, performing a tight asymptotic 
analysis of the LP bound is still an open question in coding theory). However, here empirical results 
can help us: we can solve the dual of the factor-revealing LP for small values of k to get an idea of 
how the general optimal solution looks like. Using this, it is usually possible (although sometimes 
tedious) to prove a close-to-optimal upper bound on the value of Zk- We have used this technique 
to prove an upper bound of 1.861 on 7. 

Lemma 6 For every k > 1, z^ < 1.861. 



Proof: Let r = 1.8609. By doubling a feasible solution of |^ it is easy to show that 2;^ < Z2k so 
we can assume, without loss of generality that k is sufficiently large. Consider a feasible solution of 
the program ^. It is clear from the third inequality that for every j, j' we have 

j' 

Y.(^j-d,)<f. (6) 

i=j 

Now, we define Ij and 9j as follows: 




P2k if j < pik 
k j > pik 
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I j > p2k 

where pi = 0.1991 and p2 = 0.5696. We consider Inequality ^ for every j < p2k and j' = Ij, and 
multiply both sides of this inequality by 9j. By adding up all these inequalities, we obtain 

pik p2k p2k k P2k 

j=l i=j j=pifc+l i=j j=l 

The coefficient of / in the right-hand side of the above inequality is equal to X]jL\ — ^^Pi^ + 

^'^P2(i-pi)k^ ^P"^^ — Pi A;) K, 1.8609 < 1.861. Also, the coefficients of aj and dj in the left-hand side of 
Inequality ^ are equal to 

coeff do = <^ ^ . (9) 

Notice that the sum of coefficients of Oj's is equal to 

Ecoeff[a,] = |:!:±i(M-, + l)+ g + + 



2p2 P2(l-Pl) 2p2(l-Pl) 



1.00004A: 

>k 



Now, we use the inequality ai > aj — dj — di on the expression on the left hand side of inequality 
0to reduce the coefficients of Oj's that are greater than 1, and increase the coefficient of a^-'s that 
are less than 1. Since the sum of these coefficients is greater than k, using this inequality and the 
inequality aj > we can obtain an expression E that is less than or equal to the left hand side of 
inequality 0, and in which all a^-'s have coefficient 1. The coefficient of dj in this expression will be 
equal to its coefficient in the left hand side of inequality ^, plus the absolute value of the change in 
the coefficient of the corresponding aj. Therefore, by equations ^ and ^this coefficient is equal to: 



coeff E [dj 



n=i0-^ + \{p2k-j + i)e,- 
n=lO^ + \{k-3 + l)eJ-l\ 
n=p.k+l0^ + \{k-3 + l)e, 



j < pik 

Pik < J < P2k 

j > p2k 



If i < Pik, we have {p2k — j + l)6j > {p2k — Pik)^-^ = {r + l)(p2 —pi)/P2 « 1.8609 > 1 Therefore, 
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i=l 

= r + 0(i) 
< 1.861 



Similarly, if p,k < j < p^k, we have {k - j + 1)6, > {k - p,k)^^^^^^^^^ = ^''"-'^'llf-^'^ 



1.00003 > 1. Therefore, 

j 

coeSE[dj] = 51 + - J + - 1 

i=l 

= r + 0(i) 
< 1.861 

Finally, if j > p2k, the coefficient of dj is equal to 

j 



coeSE[dj] = J2 "^i + |0 - 1| 

i=pik 

(r + l){p2 -pi) 



P2{1 -pi)k 
1.8609 
< 1.861 



{p2k -pik) + 1 



Therefore, in each case, the coefficient of dj is less than or equal to 1.861. Thus, we have proved 
that 

k k 

^ Qj - ^ 1.861dj < 1.861/. 

This clearly implies that < 1.861. □ 

Figure || shows a tight example for k = 2, for which the approximation factor of the algorithm is 
1.5. The cost of the missing edges is given by triangle inequality. Numerical computations using 
the software CPLEX show that Z300 ~ 1.81. Thus, the approximation factor of our algorithm is 
between 1.81 and 1.861. We do not know the exact approximation ratio. 



4 Algorithm 2 

Algorithm 2 is similar to the restatement of Algorithm 1. The only difference is that in Algorithm 1 
cities stop offering money to facilities as soon as they get connected to a facility, but here they still 
offer some money to other facilities. The amount that an already-connected city offers to a facility 
j is equal to the amount that it would save in connection cost by switching its facility to j. As 
we will see in the next section, this change reduces the approximation factor of the algorithm from 
1.861 to 1.61. 
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Figure 1: The approximation ratio of Algorithm 1 is at least 1.5 

Algorithm 2 

1. We introduce a notion of time. The algorithm starts at time 0. At this time, each city is 
defined to be unconnected {U := C), all facilities are unopened, and aj is set to for every j. 

At every moment, each city j offers some money from its contribution to each unopened facility 
i. The amount of this offer is computed as follows: If j is unconnected, the offer is equal to 
max(aj — Cij,0) (i.e., if the contribution of j is more than the cost that it has to pay to get 
connected to i, it offers to pay this extra amount to i); If j is already connected to some other 
facility i', then its offer to facility i is equal to max(cj'j — Cij,0) (i.e., the amount that j offers 
to pay to i is equal to the amount j would save by switching its facility from i' to i). 

2. While U ^ ^, increase the time, and simultaneously, for every city j G U, increase the 
parameter aj at the same rate, until one of the following events occurs (if two events occur at 
the same time, we process them in an arbitrary order). 

(a) For some unopened facility i, the total offer that it receives from cities is equal to the 
cost of opening i. In this case, we open facility i, and for every city j (connected or 
unconnected) which has a non-zero offer to z, we connect j to i. The amount that j had 
offered to i is now called the contribution of j toward i, and j is no longer allowed to 
decrease this contribution. 

(b) For some unconnected city j, and some open facility i, aj = Cij. In this case, connect 
city j to facility i and remove j from U. 

Clearly the main issue in the facility location problem is to decide which facilities to open. Once 
this is done, each city should be connected to the closest open facility. Observe that Algorithm 2 
makes greedy choices in deciding which facilities to open and once it opens a facility, it does not 
alter this decision. In this sense, it is also a greedy algorithm. 

5 Analysis of Algorithm 2 

The following fact should be obvious from the description of Algorithm 2. 
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Lemma 7 The total cost of the solution found by Algorithm 2 is equal to the sum of aj 's. 

Now, as in the analysis of Algorithm 1, we need to find a number 7, such that for every star S, 
'l2j£snC '^j — ^^s- Such a 7 will be an upper bound on the approximation ratio of the algorithm, 
since if for every facility i that is opened in the optimal solution and the collection A of cities that are 
connected to it, we write the inequality X^jeA'^i — lifi + SjeA'^«i) ™d add up these inequalities, 
we will obtain that the cost of our solution is at most 7 times the cost of the optimal solution. 

5.1 Deriving the factor-revealing LP 

Our proof follows the methodology of Section ^: express various constraints that are imposed by 
the problem or by the structure of the algorithm as inequalities and get a bound on the value of 7 
defined above by solving a series of linear programs. 

Consider a star S consisting of a facility having opening cost / (with a slight misuse of the notation, 
we call this facility /), and k cities numbered 1 through k. Let dj denote the connection cost between 
facility / and city j, and Uj denote the contribution of the city j at the end of Algorithm 2. We 
may assume without loss of generality that 

ai < 02 < • • • < Ofc- (10) 

We need more variables to capture the execution of Algorithm 2. For every i {1 <i <k), consider 
the situation of the algorithm at time t = ai — e, where e is very small, i.e., just a moment before city 

1 gets connected for the first time. At this time, each of the cities 1, 2, . . . , z — 1 might be connected 
to a facility. For every j < i, if city j is connected to some facility at time t, let rj^i denote the 
connection cost between this facility and city j; otherwise, let rj^i := aj. The latter case occurs if 
and only if = aj. It turns out that these variables (/, dj's, Oj's, and rj/s) are enough to write 
down some inequalities to bound the ratio of the sum of a^'s to the cost of S (i.e., / + J2j=i dj). 

First, notice that once a city gets connected to a facility, its contribution remains constant and it 
cannot revoke its contribution to a facility, so it can never get connected to another facility with a 
higher connection cost. This implies that for every j, 

rj,j+i > rj,j+2 >■■■> rj,k- (11) 

Now, consider time t = Oi — e. At this time, the amount city j offers to facility / is equal to 

max(rj^j — dj,0) if j < i, and 
max(t — dj,0) if j > i. 

Notice that by the definition of rj.j this holds even if j < i and a, = aj. It is clear from Algorithm 

2 that the total offer of cities to a facility can never become larger than the opening cost of the 
facility. Therefore, for all i, 

i-l k 

max(rj^j — dj, 0) + ^ max(aj — dj, 0) < /. (12) 

j=i j=i 

The triangle inequality is another important constraint that we need to use. Consider cities i and 
j with j < i at time t = ai — e. Let /' be the facility j is connected to at time t. By the triangle 
inequality and the definition of rj^j, the connection cost cj/j between city i and facility /' is at most 
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fj,! + di + dj. Furthermore, Cfi can not be less than t, since if it is, our algorithm could have 
connected the city i to the facility /' at a time earlier than t, which is a contradiction. Here we 
need to be careful with the special case Oj = aj. In this case, j + + dj is not more than t. If 
a, 7^ aj, the facility /' is open at time t and therefore city i can get connected to it, if it can pay 
the connection cost. Therefore for every 1 < j < i < k, 

OLi < Vj^i + di + dj. (13) 

The above inequalities form the following factor-revealing LP. 

maximize ; (14) 

f + E^=ld^ 

subject to yi<i<k: ai < a^+i 

V 1 < i < i < : Tj^i > rj^i+i 
yi<j<i<k: ai < rj^i + di + dj 

i-l 

y 1 < i < k : ^ max(rj^j — dj,0) 
k 

+ max(Qj -dj,0) < f 
j=i 

yi<j<i<k: aj,dj, f, rj^i > 

Notice that although the above optimization program is not written in the form of a linear program, 
it is easy to change it to a linear program by introducing new variables and inequalities. 



Lemma 8 // denotes the solution of the factor-revealing LP, then for every star S consisting of 
a facility and k cities, the sum of aj 's of the cities in S in Algorithm 2 is at most z^cs- 



Proof: Inequalities 11, 12, and 13 derived above imply that the values aj,dj, f,rj^i that we 
get by running Algorithm 2 constitute a feasible solution of the factor-revealing LP. Thus, the value 
of the objective function for this solution is at most Zk- n 

Lemmas |^ and § imply the following. 

Lemma 9 Let z^ he the solution of the factor-revealing LP, and 7 := sup^jzfc}. Then Algorithm 2 
solves the metric facility location problem with an approximation factor of ^. 



5.2 Solving the factor-revealing LP 

As mentioned earlier, the optimization program (^) can be written as a linear program. This 
enables us to use an LP-solver to solve the factor-revealing LP for small values of fc, in order to 
compute the numerical value of 7. Table || shows a summary of results that are obtained by solving 
the factor-revealing LP using CPLEX. It seems from the experimental results that z^ is an increasing 
sequence that converges to some number close to 1.6 and hence 7 ~ 1.6. 

We are using the same idea as Lemma ^ in Section ^ to prove the upper bound of 1.61 on z^. 
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k 


maxi<fe Zi 


10 


1.54147 


20 


1.57084 


50 


1.58839 


100 


1.59425 


200 


1.59721 


300 


1.59819 


400 


1.59868 


500 


1.59898 



Table 1: Solution of the factor-revealing LP 



Lemma 10 Let Zk be the solution to the factor-revealing LP. Then for every k, Zk < 1.61. 



Proof: Using the same argument as in Lemma ^, we can assume, without loss of generality, 
that k is sufficiently large. Consider a feasible solution of the factor-revealing LP. Let Xj^i := 
max{rj^i — dj,0). The fourth inequality of the factor-revealing LP implies that for every i < i' , 

i' i-1 

ii'-i + l)ai<J2dj+f-J2xj,i- (15) 
j=i j=i 



Now, we define li as follows: 

h = 



P2k if i < pik 
k a i > pik 



where pi and p2 are two constants (with pi < P2) that will be fixed later. Consider Inequality 15 
for every i < p2k and i' = k, and divide both sides of this inequality hy {k — i + 1). By adding up 
these inequalities we obtain 

P2k p2k k , P2k ^ p2k i-1 

1=1 z = l J=t ' t=l ' 1 = 1 J = 1 ' 

Now for every j < P2k, let yj := xj^p^k- The second inequality of the factor-revealing LP implies 
that Xj^i > yj for every j < i < P2k and Xj^i < yj for every i > p2k. Also, let ( := J2^=i i -i+i ■ 
Therefore, inequality |l^ implies 

P2k p2k li , p2k i-l 

E-fiEE^^ + cz-i:!:;^^ (17) 

1=1 1=1 J=l ' 1=1 J=l ' 

Consider the index i < p2k for which 2d£ + yi has its minimum (i.e., for every j < p2k, 2di + yi < 
2dj + yj). The third inequality of the factor-revealing LP implies that for i = p2k + 1, . . . ,k, 

Oil < ri^i + di + di< Xi^i + 2di + di < di + 2di + y^. (18) 



By adding Inequality for i = p2k -\- 1, . . . ,k with Inequality 17 we obtain 
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k P2k h J k P2k i-1 

1=1 i=l j=i * j=p2k+l i=l j=l * 

j=l ]=li=j+l 3=P2k+i i=p\k+l 

+{2de + yi){l-p2)k + Cf 

P2k k P2k , 

<Ecrf.+ E (1+ E + 



/ 1 P2A: P2fe 1 



where the last inequahty is a consequence of the inequahty 2d£ + ye < 2dj + yj < 2dj + 2yj for 

j < p2k. Now, let C := 1 + Etl^k+i 5 := (1 - P2) - E%\ IT^+T- Therefore, 

the above inequality can be written as follows: 

k P2k k 

E"i<ECdi+ E C'dj+Cf + Si2di + y,)k, (19) 

1=1 j=l j=p2k+l 

where 

c = E = 1- . . + oil), (20) 

+ 1 (P2-Pl)(l-P2) 
^ P2A: P2fe ^ 



'^=i-p2-iE E 



= ^(2 - P2 - P2 In - In + 0(1). (22) 

2 P2 - Pi 1 - P2 

Now if we choose pi and p2 such that 5 < 0, and let 7 := max((^, C) then inequality 19 implies that 

k k 

E«*<(7 + o(i))(/ + E^^)- 

1=1 i=l 

Using equations 20, and it is easy to see that subject to the condition 5 < 0, the value of 7 
is minimized when pi ~ 0.439 and p2 ~ 0.695, which gives us 7 < 1.61. □ 

Also, as in the proof of Theorem |5|, we can use the optimal solution of the factor-revealing LP that 
is computed numerically (see Table ||) to construct an example on which our algorithm performs at 
least Zk times worse than the optimum. These results imply the following. 

Theorem 11 Algorithm 2 solves the facility location problem in time O(n^), where n = max(nj, nc), 
with an approximation ratio between 1.598 and 1.61. 
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1.2 



Figure 2: The tradeoff between 7/ and 7c 



6 The tradeoff between facility and connection costs 

We defined the cost of a solution in the facihty location problem as the sum of the facility cost 
(i.e., total cost of opening facilities) and the connection cost. We proved in the previous section 
that Algorithm 2 achieves an overall performance guarantee of 1.61. However, sometimes it is useful 
to get different approximation guarantees for facility and connection costs. The following theorem 
gives such a guarantee. The proof is similar to the proof of Lemma ^. 

Theorem 12 Let'jj > 1 and 'jc := sup^jz^}, where Z]. is the solution of the following optimization 
program. 



maximize 



Eiia.-7// 



subject to VI <i < k : ai < aj+i 

V 1 < j < i < A; : rj^i> r^^j+i 
Vl<j<i<A;: ai < rj^i + dj + dj 



(23) 



i-l 



yi<i<k: ^ max{rj^i — dj, 0) 
k 

+ ^ max(ai — dj,0) < f 
j=i 

V 1 < j < i < /c : aj,dj, f, rj^i > 

Then for every instance X of the facility location problem, and for every solution SOL for 2 with 
facility cost FsoL and connection cost CsoL, the cost of the solution found by Algorithm 2 is at 
most JfFsOL + IcCsoL- 



We have computed the solution of the optimization program ^ for k = 100, and several values of 7/ 
between 1 and 3, to get an estimate of the corresponding 7c 's. The result is shown in the diagram 
in Figure |2|. Every point (7/, 7c) on the thick line in this diagram represents a value of 7/, and the 
corresponding estimate for the value of 7c. The dashed line shows the following lower bound, which 



can be proved easily by adapting the proof of Guha and Khuller [15| for hardness of the facility 
location problem. 
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Theorem 13 Let jf and 7^ be constants with 7^ < 1 + 2e~'^f . Assume there is an algorithm A 
such that for every instance I of the metric facility location problem, A finds a solution whose cost 
is not more than ^jFgoL + IcCsoL for every solution SOL for I with facility and connection costs 
FsoL and CsoL- Then NP C DTIME[nO(i°gi°s")] . 

Similar tradeoff problems are considered by Charikar and Guha Q. However, an important advan- 
tage that we get here is that all the inequalities ALG < ^jFsoL + IcCsoL are satisfied by a single 
algorithm. In Section ^, we will use the point 7^- = 1 of this tradeoff to design algorithms for other 
variants of the facility location problem. Other points of this tradeoff can also be useful in designing 
other algorithms based on our algorithm. For example, Mahdian, Ye, and Zhang use the point 
7j = 1.1 of this tradeoff to obtain a 1.52-approximation algorithm for the metric facility location 
problem. 



7 Experimental Results 

We have implemented our algorithms, as well as the JV algorithm, using the programming language 
C. We have made four kinds of experiments. In all cases the solution of the algorithms is compared 
to the optimal solution of the LP-relaxation, computed using the package CPLEX to obtain an 
upper bound on the approximation factor of the algorithms. 

The test bed of our first set of experiments consists of randomly generated instances on a 10, 000 x 
10, 000 grid: In each instance, cities and facilities are points, drawn randomly from the grid. The 
connection cost between a city and a facility is set to be equal to the euclidean distance of the 
corresponding points. Furthermore, the opening cost of each facility is drawn uniformly at random 
from the integers between and 9999. 

For the second set of experiments, we have generated random graphs (according to the distribution 
G{n,p)) and assigned uniform random weights on the edges. Cities and facilities correspond to the 
nodes of this graph, and the connection cost between a city and a facility is defined to be the shortest 
path between the corresponding nodes. The opening costs of facilities are generated at random. 

The instance sizes in both of the above types vary from 50 cities and 20 facilities to 400 cities and 150 
facilities. For each size, 15 instances are generated and the average error of the algorithm (compared 
to the LP lower bound) is computed. The results of these experiments are shown in Table ||. 

An Internet topology generator software, namely GT-ITM, is used to generate the third set of 
instances. GT-ITM is a software package for generating graphs that have a structure modeling the 



topology of the Internet |45]. This model is used because of the applications of facility location 
problems in network applications such as placing web server replicas . In this model we consider 
transit nodes as potential facilities and stub nodes as cities. The connection cost is the distance 
produced by the generator. The opening costs are again random numbers. We have generated 10 
instances for each of the 10 different instance sizes. The results are shown in Table ^. 

We also tested all algorithms on 15 instances from which is a library of test data sets for several 
operations research problems. Our results are shown in Table ^. 

As we can see from the tables. Algorithm 2 behaves extremely well, giving almost no error in many 
cases. Algorithm 1 has an error of 7% on the worst instance and an average error of 2-3%. On 
the other hand, the JV algorithm has much larger error, sometimes as high as 100 %. We should 
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Tie 




Random Points on a Grid 


Random Graphs 


JV 


ALG 1 


ALG 2 


JV 


ALG 1 


ALG 2 


50 


20 


1.0927 


1.0083 


1.0004 


1.0021 


1.0007 


1.0001 


100 


20 


1.0769 


1.0082 


1.0004 


1.0014 


1.0022 


1.0 


100 


50 


1.2112 


1.0105 


1.0013 


1.0225 


1.0056 


1.0005 


200 


50 


1.159 


1.0095 


1.001 


1.0106 


1.0094 


1.0002 


200 


100 


1.301 


1.0105 


1.0016 


1.0753 


1.0178 


1.0018 


300 


50 


1.1151 


1.0091 


1.0011 


1.0068 


1.0102 


1.0002 


300 


80 


1.1787 


1.0116 


1.001 


1.0259 


1.0171 


1.0004 


300 


100 


1.2387 


1.0118 


1.0014 


1.0455 


1.0185 


1.0009 


300 


150 


1.327 


1.0143 


1.0015 


1.1365 


1.0249 


1.0018 


400 


50 


1.0905 


1.0092 


1.0005 


1.0044 


1.012 


1.0 


400 


100 


1.8513 


1.0301 


1.0026 


1.0313 


1.0203 


1.0003 


400 


150 


1.8112 


1.0299 


1.0023 


1.1008 


1.0234 


1.0009 



Table 2: Random Graphs and Random Points on a Grid 





Uf 


JV 


ALG 1 


ALG 2 


100 


20 


1.004 


1.0047 


1.0001 


160 


20 


1.5116 


1.0612 


1.0009 


160 


40 


1.065 


1.0063 


1.0 


208 


52 


2.2537 


1.074 


1.019 


240 


60 


1.0083 


1.0045 


1.0001 


300 


75 


1.8088 


1.0478 


1.0006 


312 


52 


1.7593 


1.0475 


1.0008 


320 


32 


1.0972 


1.0015 


1.0 


400 


100 


1.0058 


1.0048 


1.0 


416 


52 


1.0031 


1.0048 


1.0 



Table 3: GT-ITM Model 



19 



Tie 


Uf 


J V 




A T r" 


50 


16 


1.0642 


1.0156 


1.0 


50 


16 


1.127 


1.0363 


1.0 


50 


16 


1.1968 


1.0258 


1.0 


50 


16 


1.2649 


1.0258 


1.0022 


50 


25 


1.1167 


1.006 


1.0028 


50 


25 


1.2206 


1.0393 


1.0 


50 


25 


1.3246 


1.0277 


1.0 


50 


25 


1.4535 


1.0318 


1.0049 


50 


50 


1.3566 


1.0101 


1.0017 


50 


50 


1.5762 


1.0348 


1.0061 


50 


50 


1.7648 


1.0378 


1.0022 


50 


50 


2.0543 


1.0494 


1.0075 


1000 


100 


1.0453 


1.0542 


1.0023 


1000 


100 


1.0155 


1.0226 


1.0 


1000 


100 


1.0055 


1.0101 


1.0 



Table 4: Instances from Operations Research library 



also note that the running times of the three algorithms did not vary significantly. In the biggest 
instances of 1000 cities and 100 facilities all the algorithms ran in approximately 1-2 seconds. The 
implementation of the algorithms as well as all the data sets are available upon request. For other 
experimental results see [^. 



8 Variants of the problem 

In this section, we show that our algorithms can also be applied to several variants of the metric 
facility location problem. 



8.1 The fc-median problem 

The k-median problem differs from the facility location problem in two respects: there is no cost 
for opening facilities, and there is an upper bound k, that is supplied as part of the input, on the 
number of facilities that can be opened. The k-facility location problem is a common generalization 
of fc-median and the facility location problem. In this problem, we have an upper bound k on the 
number of facilities that can be opened, as well as costs for opening facilities. The /c-median problem 
is studied extensively |§, ^, 0, |2^ and the best known approximation algorithm for this problem, 
due to Arya et al. [^, achieves a factor of 3 + e. It is also straightforward to adapt the proof of 
hardness of the facility location problem |15] to show that there is no (1 + | — e)-approximation 
algorithm for /c-median, unless NP C DTIME[n'^('°si°g")]. Notice that this proves that /c-median is 
a strictly harder problem to approximate than the facility location problem because the latter can 
be approximated within a factor of 1.61. 

Jain and Vazirani p^] reduced the A;-median problem to the facility location problem in the following 
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sense: Suppose A is an approximation algorithm for the facihty location problem. Consider an 
instance I of the problem with optimum cost OPT, and let F and C be the facility and connection 
costs of the solution found by A. We call algorithm A a Lagrangian Multiplier Preserving a- 
approximation (or LMP a-approximation for short) if for every instance I, C < a{OPT — F). Jain 
and Vazirani p2| show that an LMP a-approximation algorithm for the metric facility location 
problem gives rise to a 2a-approximation algorithm for the metric A;-median problem. They have 
noted that this result also holds for the /c-facility location problem. 



Lemma 14 |22| An LMP a-approximation algorithm for the facility location problem gives a la- 



approximation algorithm for the k-facility location problem. 

Here we use Theorem |l2| together with the scaling technique of Charikar and Guha Q to give an 
LMP 2-approximation algorithm for the metric facility location problem based on Algorithm 2. This 
will result in a 4-approximation algorithm for the metric A;-facility location problem, whereas the 
best previously known was 6 ||2^. 



Lemma 15 Assume there is an algorithm A for the metric facility location problem such that for 
every instance I and every solution SOL for I, A finds a solution of cost at most FgoL + oiCsoL, 
where FsoL o-nd CsoL o-tc facility and connection costs of SOL, and a is a fixed number. Then 
there is an LMP a-approximation algorithm for the metric facility location problem. 
Proof: Consider the following algorithm: The algorithm constructs another instance I' of the 
problem by multiplying the facility opening costs by a, runs A on this modified instance I', and 
outputs its answer. It is easy to see that this algorithm is an LMP a-approximation. □ 

Now we only need to prove the following. The proof of this theorem follows the general scheme that 
is explained in Section ^. 

Theorem 16 For every instance I and every solution SOL for I, Algorithm 2 finds a solution of 
cost at most FsoL + '^CsoL, where FsoL and CsoL o,re facility and connection costs of SOL. 
Proof: By Theorem ^ we only need to prove that the solution of the factor-revealing LP 23 with 



7/ = 1 is at most 2. We first write the maximization program 22 as the following equivalent linear 
program. 



maximize 



k 

T.^-f (24) 



=1 



k 

subject to ^ di 



=1 



VI < 


i 


< k 




ai - 


- ai+i < 


VI < 


3 


< i 


< 


k : 


rj^i+i - rj^i < 


VI < 


3 


< i 


< 


k : 


ai - rj^i - di- dj <0 


VI < 


3 


< i 


< 


k : 


rj,i - di- gij < 


VI < 


i 


<3 


< 


k : 


ai — dj — hij < 










i-l 


k 


VI < 


i 


< k 




E 


9i,j + E hj - / < 












j=i 



V i : aj , dj , /, rj^i, gij , hij > 
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We need to prove an upper bound of 2 on the sokition of the above LP. Since this program is 
a maximization program, it is enough to prove the upper bound for any relaxation of the above 
program. Numerical results (for a fixed value of k, say k = 100) suggest that removing the second, 
third, and seventh inequalities of the above program does not change its solution. Therefore, we can 
relax the above program by removing these inequalities. Now, it is a simple exercise to write down 
the dual of the relaxed linear program and compute its optimal solution. This solution corresponds 



to multiplying the third, fourth, fifth, and sixth inequalities of the linear program 24 by 1/k, and 
the first one by (2 — 1/ A;), and adding up these inequalities. This gives an upper bound of 2 — 
on the value of the objective function. Thus, for jf = 1, we have 7c < 2. In fact, 7c is precisely 
equal to 2, as shown by the following solution for the program |2^. 



2- i 



i = 1 
2<i<k 



1 i = l 

2<i< k 



/ = 2(fc 



J = l 
2<j<k 

1) 



□ 



This example shows that the above analysis of the factor-revealing LP is tight. 

Lemma ^ and Theorem ^ provide an LMP 2-approximation algorithm for the metric facility loca- 
tion problem. This result improves all the results in Jain and Vazirani [ 22 1 , and gives straightforward 
algorithms for some other problems considered by Charikar et al 

Notice that Theorem 13 shows that finding an LMP (1 -|- - — e)-approximation for the metric facility 



location problem is hard. Also, the integrality gap examples found by Guha [14| show that Lemma 



14 is tight. This shows that one cannot use Lemma 14 as a black box to obtain a smaller factor 
than 2 + 1 for A;-median problem. Note that 3 + e approximation is already known Q for the 
problem. Hence if one wants to beat this factor using the Lagrangian relaxation technique then it 
will be necessary to look into the underlying LMP algorithm as already been done by Charikar and 
Guha il. 



8.2 Facility location game 

An important consideration, in cooperative game theory, while distributing the cost of a shared 
utility, is that the cost shares should satisfy the coalition participation constraint, i.e., the total 
cost share of any subset of the users shall not be larger than their stand-alone cost of receiving 
the service, so as to prevent this subset from seceding. In general, this turns out to be a stringent 
condition to satisfy. For the facility location problem, Goemans and Skutella showed that such 
a cost allocation is only possible for a very special case. Furthermore, intractability sets in as well, 
for instance, in the case of the facility location problem, computing the optimal cost of serving a set 
of users is NP-hard. 



In |24] Jain and Vazirani relax this notion: for a constant k, ensure that the cost share of any subset 
is no more than k times its stand-alone cost. They also observe that LP-based approximation 
algorithms directly yield a cost sharing method compatible with this relaxed notion. However, 
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this involves solving an LP, as in the case of LP-rounding. We observe that our facility location 
algorithms automatically yield such a cost sharing method, with k = 1.861 and k = 1.61 respectively, 
by defining the cost share of city j to be aj. 



8.3 Arbitrary demands 

In this version, for each city j, a non-negative integer demand dj, is specified. An open facility i 
can serve this demand at the cost of Cijdj. The best way to look at this modification is to reduce 
it to unit demand case by making dj copies of city j. This reduction suggests that we need to 
change our algorithms , so that each city j raises its contribution aj at rate dj. Note that the 
modified algorithms still have the same running time in more general cases, where dj is fractional 
or exponentially large, and achieve the same approximation ratio. 



8.4 Fault tolerant facility location with uniform connectivity requirements 

We are given a connectivity requirement rj for each city j, which specifies the number of open 
facilities that city j should be connected to. We can see that this problem is closely related to the 



set multi-cover problem, in the case that every set can be picked at most once [38]. The greedy 
algorithm for set-cover can be adapted for this variant of the multi-cover problem achieving the 
same approximation factor. We can use the same approach to deal with the fault tolerant facility 
location: The mechanism of raising dual variables and opening facilities is the same as in our initial 
algorithms. The only difference is that city j stops raising its dual variable and withdraws its 
contribution from other facilities, when it is connected to rj open facilities. We can show that when 
all rj's are equal, our algorithms can still achieve the approximation factor of 1.861 and 1.61. 



8.5 Facility location with penalties 

In this version we are not required to connect every city to an open facility; however, for each city j, 
there is a specified penalty, pj, which we have to pay, if it is not connected to any open facility. We 
can modify our algorithms for this problem as follows: If aj reaches pj before j is connected to any 
open facility, the city j stops raising its dual variable and keeps its contribution equal to its penalty 
until it is either connected to an open facility or all remaining cities stop raising their dual variables. 
At this point, the algorithm terminates and unconnected cities remain unconnected. Using the 
linear programming formulation introduced in Charikar et al. inequalities (4.6)-(4.10)), we can 
show that the approximation ratio and running time of our modified algorithms have not changed. 



8.6 Robust facility location 

In this variant, we are given a number / and we are only required to connect ric — I cities to open 
facilities. This problem can be reduced to the previous one via Lagrangian relaxation. Very recently, 
Charikar et al. proposed a primal-dual algorithm, based on JV algorithm, which achieves an 
approximation ratio of 3. As they showed, the linear programming formulation of this variant has 
an unbounded integrality gap. In order to fix this problem, they use the technique of parametric 
pruning, in which they guess the most expensive facility in the optimal solution. After that, they 
run JV algorithm on the pruned instance, where the only allowable facilities are those that are not 
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more expensive than the guessed facihty. Here we can use the same idea, using Algorithm 1 rather 
than the JV algorithm. Using a proof similar to the proof of the Theorem 3.2 in we can prove 
that this algorithm solves the robust facility location problem with an approximation factor of 2. 



8.7 Dealing with capacities 

In real applications, it is not usually the case that the cost of opening a facility is independent of the 
number of cities it will serve. But we can assume that we have economy of scales, i.e., the cost of 
serving each city decreases when the number of cities increases (since publication of the first draft 
of this paper, this problem has also been studied in |jl9|). In order to capture this property, we 
define the following variant of the capacitated metric facility location problem. For each facility i, 
there is an initial opening cost /j. After facility i is opened, it will cost Si to serve each city. This 
variant can be solved using metric uncapacitated facility location problem: We just have to change 
the metric such that for each city j and facility i, c'^j = Cij + Sj. Clearly, c' is also a metric and the 
solution of the metric uncapacitated version to this problem can be interpreted as a solution to the 
original problem with the same cost. 

We can reduce the variant of the capacitated facility location problem in which each facility can be 



opened many times |22] to this problem by defining Si = fi/ui. If in the solution to this problem k 
cities are connected to facility i, we open this facility \k/ui~\ times. The cost of the solution will be 
at most two times the original cost so any a-approximation for the uncapacitated facility location 
problem can be turned into a 2Q:-approximation for this variant of the capacitated version. We can 



also use the same technique as in |22] to give a factor 3-approximation algorithm for this problem 



based on the LMP 2-approximation algorithm for uncapacitated facility location problem. 



9 Discussion 

The method of dual fitting can be seen as an implementation of the primal-dual schema in which, 
instead of relaxing complementary slackness conditions (which is the most common way of im- 
plementing the schema), we relax feasibility of the dual. However, we prefer to reserve the term 
primal-dual for algorithms that produce feasible primal and dual solutions. 

Let us show how the combination of dual fitting with factor-revealing LP applies to the set cover 



problem. The duality-based restatement of the greedy algorithm (see |44|) is: All elements in the 
universal set U increase their dual variables uniformly. Each element contributes its dual towards 
paying for the cost of each of the sets it is contained in. When the total contribution offered to 
a set equals its cost, the set is picked. At this point, the newly covered elements freeze their dual 
variables and withdraw their contributions from all other sets. As stated in the introduction, the 
latter (important) step ensures that the primal is fully paid for by the dual. However, we might 
not get a feasible dual solution. To make the dual solution feasible we look for the smallest positive 
number Z, so that when the dual solution is shrunk by a factor of Z, it becomes feasible. An upper 
bound on the approximation factor of the algorithm is obtained by maximizing Z over all possible 
instances. 

Clearly Z is also the maximum factor by which any set is over-tight. Consider any set S. We 
want to see what is the worst factor, over all sets and over all possible instances of the problem, 
by which a set S is over-tight. Let the elements in S be 1,2, - ■ ■ ,k. Let rcj be the dual variable 
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corresponding to the element i at the end of the algorithm. Without loss of generality we may 
assmne that xi < X2 < ■ ■ ■ < x^. It is easy to see that at time t = x^ , total duals offered to 5 is at 
least {k — i + Therefore, this value cannot be greater than the cost of the set S (denoted by 
cs)- So, the optimum solution of the following mathematical program gives an upper bound on the 
value of Z. (Note that cs is a variable not a constant). 

(25) 

Xi < Xi+l 

{k — i + l)xi < Cs 
Xi>0 



maximize 
subject to 



J2i=i Xj 
cs 

yi <i<k 
yi<i<k 
yi <i<k 
cs > 1 



The above optimization program can be turned into a linear program by adding the constraint 
Cs = 1 and changing the objective function to J2i=i ^i- We call this linear program the factor- 
revealing LP. Notice that the factor-revealing LP has nothing to do with the LP formulation of 
the set cover problem; it is only used in order to analyze this particular algorithm. This is the 
important distinction between the factor-revealing LP technique, and other LP-based techniques in 
approximation algorithms. 

One advantage of reducing the analysis of the approximation guarantee of an algorithm to obtaining 
an upper bound on the optimal solution to a factor-revealing LP is that one can introduce emperical 
experimentation into the latter task. This can also help decide which aspects of the execution of 
the algorithm to introduce into the factor-revealing LP to obtain the best possible bound on the 
performance of the algorithm, e.g., we needed to introduce the variables rj^i in Section 5T in order 
to get a good bound on the approximation ratio of Algorithm 2. 

In general, this technique is not guaranteed to yield a tight analysis of the algorithm, since the 
algorithm may be performing well not because of local reasons but for some global reasons that are 
difficult to capture in a factor-revealing LP. In the case of set cover, this method not only produces 
a tight analysis, but the factor-revealing LP also helps produce a tight example for the algorithm. 
From any feasible solution x of factor-revealing LP 25, one can construct the following instance: 
There are k elements 1, . . . ,k, a set S = {1, . . . ,k} of cost 1 + e which is the optimal solution, and 
sets Si = {i} of cost Xi ior i = 1, . . . ,k. It is easy to verify that the greedy algorithm gives a solution 
that is J2 times worse than the optimal on this instance. Picking x to be the optimal solution, we 
get a tight example, and also show that the approximation ratio of the greedy algorithm is precisely 
equal Hn, the optimal solution of the factor-revealing LP. 

Finally, in terms of practical impact, what is the significance of improving the approximation guar- 
antee for facility location from 3 to 1.81 or 1.61 when practitioners are seeking algorithms that 
come within 2% to 5% of the optimal? The superior experimental results of our algorithms, as 
compared with the JV algorithm, seem to provide the answer and to support the argument made 
in 1 44] (Preface, page IX) that the approximation factor should be viewed as a "measure that forces 
us to explore deeper into the combinatorial structure of the problem and discover more powerful 
tools for exploiting this structure" and the observation that "sophisticated algorithms do have the 
error bounds of the desired magnitude, 2% to 5%, on typical instances, even though their worst case 
error bounds are much higher" . 
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